132 research outputs found

    Scaling Heterogeneous Databases and the Design of Disco

    Get PDF
    Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating new sources into the model. Database implementors must deal with the translation of queries between query languages and schemas. The Distributed Information Search COmponent (Disco) 1 addresses these problems. Query processing semantics are developed to process queries over data sources which do not return answers. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and translates queries. This paper describes (a) the distributed mediator architecture ofDisco, (b) its query processing semantics, (c) the data model and its modeling of data source connections, and (d) the interface to underlying data sources. 1

    Partial Answers for Unavailable Data Sources

    Get PDF
    Projet RODINMany heterogeneous database system products and prototypes exist today; they will soon be deployed in a wide variety of environments. All existing systems suffer from an {\em Achilles' heel}: if some sources are unavailable when accessed, these systems either silently ignore them or generate an error, i.e. they {\em ungraciously fail}. This behavior is improper in environments where there is a non-negligible probability that data sources cannot be accessed (e.g., Internet). In this paper, we propose a novel approach to this issue where, in presence of unavailable data sources, the answer to a query is a {\em partial answer}. A partial answer is itself a query that results from theo partial evaluation of the original query; it is composed of the data that have been obtained and processed during the evaluation and of a representation of the unfinished work to be done. Partial answers can be resubmitted to the system in order to obtain the final answer to the original query, or another partial answer. Additionally, the application program can extract information from a partial answer through the use of a secondary query. This secondary query is called a {\em parachute query}. In this paper we give a taxonomy of partial answers and parachute queries. We present algorithms for the evaluation of queries in presence of unavailable data sources, and we describe an implementation

    Synergistic Integration of Large Language Models and Cognitive Architectures for Robust AI: An Exploratory Analysis

    Full text link
    This paper explores the integration of two AI subdisciplines employed in the development of artificial agents that exhibit intelligent behavior: Large Language Models (LLMs) and Cognitive Architectures (CAs). We present three integration approaches, each grounded in theoretical models and supported by preliminary empirical evidence. The modular approach, which introduces four models with varying degrees of integration, makes use of chain-of-thought prompting, and draws inspiration from augmented LLMs, the Common Model of Cognition, and the simulation theory of cognition. The agency approach, motivated by the Society of Mind theory and the LIDA cognitive architecture, proposes the formation of agent collections that interact at micro and macro cognitive levels, driven by either LLMs or symbolic components. The neuro-symbolic approach, which takes inspiration from the CLARION cognitive architecture, proposes a model where bottom-up learning extracts symbolic representations from an LLM layer and top-down guidance utilizes symbolic representations to direct prompt engineering in the LLM layer. These approaches aim to harness the strengths of both LLMs and CAs, while mitigating their weaknesses, thereby advancing the development of more robust AI systems. We discuss the tradeoffs and challenges associated with each approach.Comment: AAAI 2023 Fall Symposiu

    Dynamic Query Operator Scheduling for Wide-Area Remote Access

    Get PDF
    Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios

    Leveraging Mediator Cost Models with Heterogeneous Data Sources

    Get PDF
    Projet RODINDistributed systems require declarative access to diverse data sources of information. One approach to solving this heterogeneous distributed database problem is based on mediator architectures. In these architectures, mediators accept queries from users, process them with respect to wrappers, and return answers. Wrapper provide access to underlying data sources. To efficiently process queries, the mediator must optimize the plan used for processing the query. In classical databases, cost-estimate based query optimization is an effective method for optimization. In a heterogeneous distributed databases, cost-estimate based query optimization is difficult to achieve because the underlying data sources do not export cost information. This paper describes a new method that permits the wrapper programmer to export cost estimates (cost estimate formulas and statistics). For the wrapper programmer to describe all cost estimates may be impossible due to lack of information or burdensome due to the amount of information. We ease this responsibility of the wrapper programmer by leveraging the generic cost model of the mediator with specific cost estimates from the wrappers. This paper describes the mediator architecture, the language for specifying cost estimates, the algorithm for the blending of cost estimates during query optimization, and experimental results based on a combination of analytical formulas and real measurements of an object database system

    Equal Time for Data on the Internet with WebSemantics

    Get PDF
    Projet RODINRésumé disponible dans le fichier PD
    • …
    corecore